The basic function accepts a GeoPandas GeoDataFrame and the column name of the field for which you want to create a Dorling cartogram.
Currently, a stopping rule has not been implemented, therefore a trial-and-error approach is required to arrive at a suitable number of iterations. Generally a good solution requires more than 100 iterations. The verbose option prints out the current iteration and displacement, a suitable number of iterations is generally arrived at when displacement converges on zero, but visual inspection of the solution gives a good idea of the acceptability of the cartogram.
As the output is a GeoPandas GeoDataFrame, plotting is straightforward using the built-in .plot() method.
NB The field used for creating the Dorling cartogram should contain values which are positive and greater than zero. It is important that the input data be in a projected (Cartesian) coordinate system. The GeoDataFrame should be structured so that 1 row is 1 record.
The pseudocode for computing the Dorling cartogram is given in Dorling (1996), and is as follows:
For each region:
Calculate the radius of a circle so that its area is proportional to population (1)
While the forces calculated below are not negligible:
For each region (the order of calculation has no effect):
For each region which overlaps with the region:
Record a force away from the overlap in proportion to it
For each region which originally neighboured the region:
Record a force towards it proportional to distance away (2)
If the forces of repulsion are greater than attraction:
Scale the forces to less than the distance of the closest circle
Combine the two aggregate forces for each circle (3)
For each region:
Apply the forces recorded to be acting on each circle to its centroid (4)
The references below gives Dorling's original algorithm in Pascal for computing the cartogram, the python implementation here largely recreates that algorithm.
Dorling D. 1995. Visualizing Changing Social Structure from a Census. Environment and Planning A. 27(3):353-378.
Dorling D. 1996. Area Cartograms: Their Use and Creation. Concepts and Techniques in Modern Geography 59. Geobooks, Norwich. Available: http://www.dannydorling.org/wp-content/files/dannydorling_publication_id1448.pdf Last Accessed: 5/15/2017
In [1]:
# Some required libraries.
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
from shapely.geometry import Point
# import the non-contiguous cartogram function
from cartogrampy.dorling_cartogram import dorling_cartogram
%matplotlib inline
In [2]:
# Get some data for London
# Read in London Boroughs geojson and project to British National Grid.
ldn_boro = gpd.read_file('data/LDN_Boro.geojson').to_crs(epsg=27700)
# Get some population data for London Boroughs
ldn_pop = pd.read_excel('data/Pandas_Lon_Pop.xlsx')
# Merge the population data with the geospatial data
ldn_boro = ldn_boro.merge(ldn_pop, how='left', left_on='GSS_CODE', right_on='New Code')
In [3]:
# Compute Dorling Cartogram
ldn_dorling = dorling_cartogram(ldn_boro,2015,iterations=200,verbose=False)
In [4]:
# Map the outcome
fig, ax = plt.subplots(1,2,figsize=(16,8))
ldn_boro.plot(column = 2015,cmap = 'PuRd', scheme = 'Quantiles', k=4,legend=True, ax=ax[0])
ldn_dorling.plot(color='r',ax=ax[1])
ax[0].axis('equal')
ax[1].axis('equal')
Out[4]:
In [5]:
# Get some data for US States
usstate = gpd.read_file('data/US_State_2016_5m.geojson')
# Set crs
usstate.crs = {'init': u'epsg:4269'}
# Get continental US and project to NAD83 Contiguous US Albers.
usstate = usstate[~usstate['STUSPS'].isin(['AK', 'HI', 'AS', 'PR',
'GU', 'MP', 'VI'])].to_crs({'init': 'epsg:5070'})
# Read in state populations
state_pop = pd.read_excel('data/Pandas_US_Pop.xlsx')
# Merge population data
usstate = usstate.merge(state_pop, how='left', left_on='STUSPS', right_on='Code')
In [6]:
# Adding id_field variable for labelling purposes
us_dorling = dorling_cartogram(usstate,2016,id_field='STUSPS',iterations=350,verbose=False)
In [7]:
# Map the outcome, the mid-west is v. sparsely populated hence the isolated bubbles.
# This actually works quite well.
fig, ax = plt.subplots(1,2,figsize=(16,8))
usstate.plot(column = 2016,cmap = 'PuRd', scheme = 'Quantiles', k=4,legend=True, ax=ax[0])
# label the areas
usstate.apply(lambda x: ax[0].annotate(s=x['Code'], xy=x['geometry'].centroid.coords[0], ha='center',zorder=1),axis=1)
us_dorling.plot(color='r',ax=ax[1],zorder=0)
# label the bubbles
us_dorling.apply(lambda x: ax[1].annotate(s=x['STUSPS'], xy=x['geometry'].centroid.coords[0], ha='center',zorder=1),axis=1)
ax[0].axis('equal')
ax[1].axis('equal')
Out[7]:
In [8]:
# Let's have a look at some other options for starting positions.
# population weighted centers
# Get population-weighted centroid coordinates from US Census website
url = r'https://www2.census.gov/geo/docs/reference/cenpop2010/CenPop2010_Mean_ST.txt'
pop_cent = pd.read_csv(url)
# Convert longitude and latitude fields to shapely point features
pop_cent['pwc'] = pop_cent.apply(lambda x: Point(x['LONGITUDE'],x['LATITUDE']), axis=1)
# Make the dataframe a geodataframe
pop_cent = gpd.GeoDataFrame(pop_cent,geometry=pop_cent['pwc'])
# set coordinate reference system
pop_cent.crs = {'init': u'epsg:4269'}
# Project to NAD83 Contiguous US Albers.
pop_cent = pop_cent.to_crs({'init': 'epsg:5070'})
# Update population-weighted centroids field
pop_cent['pwc'] = pop_cent['geometry']
# Merge with US States file.
# Make matching columns equivalent datatypes
usstate['STATEFP'] = pd.to_numeric(usstate['STATEFP'], downcast='signed')
# Merge population data
usstate = usstate.merge(pop_cent[['STATEFP','pwc']], how='left', on='STATEFP')
# Pass the pwc field to position to initialize Dorling cartogram at population-weighted centroids.
us_dorling_w = dorling_cartogram(usstate,2016,id_field='STUSPS',position = 'pwc',iterations=400,verbose=False)
print('Population-weighted centroids done')
# bounding box centers
us_dorling_c = dorling_cartogram(usstate,2016,id_field='STUSPS',position = 'center',iterations=400,verbose=False)
print('Bounding box centers done')
# representative points
us_dorling_r = dorling_cartogram(usstate,2016,id_field='STUSPS',position = 'rep',iterations=400,verbose=False)
print('Representative point centers done')
In [9]:
# Map the outcomes, different starting positions create slightly different solutions.
fig, ax = plt.subplots(2,2,figsize=(16,16))
usstate.plot(column = 2016,cmap = 'PuRd', scheme = 'Quantiles', k=4,legend=True, ax=ax[0,0])
# label the areas
usstate.apply(lambda x: ax[0,0].annotate(s=x['Code'], xy=x['geometry'].centroid.coords[0], ha='center',zorder=1),axis=1)
# Bounding box centers
us_dorling_c.plot(color='r',ax=ax[0,1],zorder=0)
# label the bubbles
us_dorling_c.apply(lambda x: ax[0,1].annotate(s=x['STUSPS'], xy=x['geometry'].centroid.coords[0], ha='center',zorder=1),axis=1)
ax[0,1].set_title("Bounding Box Centers")
# Representative point centers
us_dorling_r.plot(color='r',ax=ax[1,0],zorder=0)
# label the bubbles
us_dorling_r.apply(lambda x: ax[1,0].annotate(s=x['STUSPS'], xy=x['geometry'].centroid.coords[0], ha='center',zorder=1),axis=1)
ax[1,0].set_title("Representative Point Centers")
# Population-weighted centroids
us_dorling_w.plot(color='r',ax=ax[1,1],zorder=0)
# label the bubbles
us_dorling_w.apply(lambda x: ax[1,1].annotate(s=x['STUSPS'], xy=x['geometry'].centroid.coords[0], ha='center',zorder=1),axis=1)
ax[1,1].set_title("Population-weighted Centroids")
ax[0,0].axis('equal')
ax[0,1].axis('equal')
ax[1,0].axis('equal')
ax[1,1].axis('equal')
Out[9]: